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^ Abstract 

m 

^^ Contact tracing data collected from disease outbreaks has received relatively 

m 

^D little attention in the epidemic modelling literature because it is thought to be un- 



reliable: infection sources might be wrongly attributed, or data might be missing 
due to resource contraints in the questionnaire exercise. Nevertheless, these data 



y might provide a rich source of information on disease transmission rate. This paper 

presents novel methodology for combining contact tracing data with rate-based con- 
tact network data to improve posterior precision, and therefore predictive accuracy. 
We present an advancement in Bayesian inference for epidemics that assimilates 
these data, and is robust to partial contact tracing. Using a simulation study based 
on the British poultry industry, we show how the presence of contact tracing data 
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improves posterior predictive accuracy, and can directly inform a more effective 
control strategy. 

Keywords: Epidemic, Bayesian, reversible jump MCMC, avian influenza, contact tracing 

1 Introduction 

In a world in which people and animals move with increasing frequency and dis- 
tance, authorities must respond to disease outbreaks at maximum efficiency according to 
economic, social, and political pressures. Field epidemiologists are typically faced with 
making decisions based on imperfect and heterogeneous data sources. European Eco- 
nomic Community (1992) Council Directive 92/119/EEC requires member states to iden- 
tify "other holdings... which may have become infected or contaminated"; this is achieved 
through contact tracing, performed by the competent authority - for instance Defra's 
Framework Plan for Exotic Animal Diseases (Defra, 2009). Though resource limitations 
might prevent the implementation of efficient case detection and removal based on contact 
tracing data (CTD), the presence of CTD might provide a rich source of information, both 
on contact frequency and probability of infection given a contact. In a practical setting, 
however, rigorous collection of contact tracing information from all detected cases is diffi- 
cult to achieve; more typically, it will be collected from as many individuals as resources 
allow. Therefore, for a given epidemic the amount of contact tracing data available for 
analysis is not guaranteed in advance, the challenge for the field epidemiologist being 
how best to use these data to investigate disease dynamics, and inform disease control 
decisions. 

This paper introduces a Bayesian approach to the assimilation of CTD into model- 
based inference and control for infectious diseases based on spatiotemporal case data. This 
can be applied to general classes of continuous time epidemic models, and we provide 
illustration using a flexible epidemic modelling framework as described in O'Neill and 



Roberts (1999) and Jewell et al. (2009b, a). Our methodology is particularly useful since it 
automatically adapts to the information contained in CTD, over and above that contained 
in the adjunct timeseries. 

In advance of an outbreak, network and spatiotemporal dynamical models of disease 
have become a standard for evaluating the likely effect of various control measures, and 
are now routinely used for contingency planning. Using demographic data as well as case 
time series data, previous work has shown that real time estimation of epidemic model 
parameters can potentially be highly effective in informing decision-making (Cauchemez 
et al., 2006; Jewell et al., 2009a). In the early stages of an epidemic case data are often 
sparse, though this is when choosing the right course of control strategy is most criti- 
cal (Anderson and May, 1991). Therefore, decisions must often be made in the face of 
considerable uncertainty. For such a statistical decision support tool, assimilating dif- 
ferent sources of data is often helpful in maximising statistical information, leading to 
more accurate model predictions and improved decision-making throughout the epidemic 
(Presanis et al., 2010). Nevertheless, incorporating diverse data sources into a tractable 
likelihood function commonly presents a methodological problem (see for example Diggle 
and Elliott (1995)). 

Much has been written on how contact tracing may be used to decrease the time 
between infection and detection (notification) during epidemics. However, this focuses 
on the theoretical aspects of how contact tracing efficiency is related to both epidemic 
dynamics and population structure. It has been shown that, providing the efficiency of 
following up any contacts to look for signs of disease is high, this is a highly effective 
method of slowing the spread of an epidemic, and finally containing it (see for example 
Eames and Keeling (2003); Kiss et al. (2005); Klinkenberg et al. (2006)). In contrast, the 
use of CTD in inferring epidemic dynamics does not appear to have been well exploited. 



During the UK outbreak of foot and mouth disease in 2001, the Ministry of Agriculture, 
Farming, and Fisheries (now Defra) inferred a spatial risk kernel by assuming that the 
source of infection was correctly identified by the field investigators, giving an empirical 
estimate of the probability of infection as a function of distance (Ferguson et al., 2001; 
Keeling et al., 2001; Savill et al., 2006). Strikingly, this shows a high degree of similarity to 
spatial kernel estimates based on the statistical techniques of Diggle (2006) and Kypraios 
(2007) without using contact tracing information. However, Cauchemez et al. (2006) make 
the point that the analysis of imperfect CTD requires more complex statistical approaches, 
although they abandon contact tracing information altogether in their analysis of the 
2003 SARS epidemic in China. In human health, Blum and Tran (2010) devise a model 
that incorporates contact tracing for HIV-AIDS by dividing cases into those detected 
by random sampling or contact tracing. This allows them to estimate the proportion 
of undetected cases in the population at a particular observation time, although the 
construction of the model does not allow them to estimate the probability of infection 
given a contact. 

Addressing this, we present a model which uses contact tracing information between 
pairs of individuals for time periods when it is available, and falls back on a Poisson 
process likelihood for time periods when it is not. We illustrate our methodology using 
an example from the UK agricultural industry. 

In Section 2, we describe a motivating example of a potential outbreak of avian in- 
fluenza in the British poultry industry. In Section 3, the existing approach to continuous 
time epidemic models is reviewed. Section 4 then presents our new class of embedded 
model for incorporating CTD. In Section 5 we develop a reversible jump Markov Chain 
Monte Carlo algorithm to estimate the posterior density, showing the feasibility of exact 
Bayesian computation for problems of this type. Section 6 then applies this methodology 



to the avian influenza example, showing how this methodology might be used in the event 
of an outbreak of high pathogenicity avian influenza. 

2 Motivating Example 

One motivation for this paper stems from the example of a potential avian influenza 
outbreak in the British poultry industry (Jewell et al., 2009a), using data from Defra's 
Great Britain Poultry Register as well as contact network data (Dent et al., 2008; Sharkey 
et al., 2008). We identify three contact networks contained in the data with different 
levels of information. The company association network is "static" : an edge is present if 
and only if two poultry holding belong to the same production company. The feed mill 
delivery and slaughterhouse networks are "dynamic" meaning they contain edge frequency 
information - for example, we know not only that two holdings are connected by a feed 
mill delivery, but how often a feed lorry runs between them. Geographic coordinates for 
each holding allow us to consider non-network spatially related transmission. We also 
consider "background" infection sources, accounting for infections unexplained by the 
other transmission modes. 

The presence of explicit contact networks within the poultry industry presents an 
interesting opportunity to investigate the possibility of including CTD into the SIR-type 
epidemic model (Kermack and McKendrick, 1927). Where contact frequency data is 
available, we can view the networks as dynamic with contacts occurring according to a 
Poisson process with intensity equal to the frequency along a directed edge of the network. 
Normally this process is not observed, and we only have a mean contact intensity between 
individuals to use as covariate data in a statistical analysis. However, the collection of 
CTD presents the possibility for making inference from direct observations of the contact 
process, albeit for a select period of time leading up to a case detection. 

In the typical livestock setting, CTD is gathered in response to the notification (ie case 



detection) of an infectious premises (IP). The resulting data are a list of contacts (with 
time, source, destination, and type) that have been made in and out of the IP during a 
period prior to the notification. The length of this period - the "contact tracing window" 
- during which contact tracing is gathered is stipulated by policy, and is typically longer 
than the expected infection to detection time for the disease in question. 

Though somewhat idealised here, these data provide us a platform from which to 
develop the methodology required to make use of CTD for inference on epidemic models. 
In livestock epidemics, CTD is not routinely passed to the modelling community, and it 
is hoped that the results presented in this paper will encourage the authorities to do so. 

3 A review of continuous time epidemic models 

To begin, we provide an overview of the SINR epidemic model for heterogeneously- 
mixing populations (see also Jewell et al. (2009b)). The SIR model is extended by consid- 
ering the population to be composed of individuals who, at any time t, exist in one of four 
states: Susceptible, Infected, Notified, and Removed. Progression through these states 
is assumed to be serial: individuals begin as susceptible, become infected, are notified 
(ie disease is detected), and are finally removed from the population (either by death, or 
life- long immunity). The term "individual" refers to an epidemiologically discrete unit, 
which might be a person or animal, or might be of higher order, such as a household or 
farm (as is the case for our HPAI example). It is then assumed that individuals become 
infected via transmission modes k = 1, . . . , K which might be contact network or spatial 
proximity to infected individuals. Independent "background" sources of infection are also 
included to account for infection sources other than those explicitly modelled. 

Conditioning on the initial infective k, individuals j are assumed to become infected 
according to a time-inhomogeneous Poisson process with instantaneous rate Xj{t) equal to 
the sum of the infection rates across all transmission modes from all infected and notified 
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individuals present in the population. Let S{t), I{t), M{t), and TZ{t) be the sets of 
susceptible, infected, notified, and removed individuals at time t, with the restriction that 
the entire population V = S{t) UX(t) [Jj\f{t) U 7^(t). Furthermore, let I, N, and R the 
corresponding vectors of individuals' infection, notification, and removal times. Infection 
times are, of course, never directly observed and we therefore treat / as missing data. Since 
the aim of the methodology is to make inference on an epidemic in progress observed at 
time Tabs, infected individuals are partitioned into j : Nj > Tobs occult (ie undetected) 
infections with right censored notification and removal times, and j : Nj < Tobs known 
(ie detected) infections. Given the disease transmission parameters = {e, /3,7,p} (see 
below), the likelihood function (here denoted by -^yi(O) ^^ the epidemic process up to the 
analysis time Tobs is 



LA{I,N,R\e) = II (A,(/-))-exp 



-'- obs 



n fu{N,-i,)^ n \^-FB{Tobs-h)] (1) 

r-N<T,ts r-Ni>T,ts 



where /li(-) is the pdf, and F£,{-) the corresponding cdf, for the infection to notification 
time. Xj{t) is further decomposed 
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where e represents background infection rate common to all individuals, and 7 measures 
the effect of quarantine measures imposed upon notified individuals. A wide range of 



transmission modes can then be specified, for exampie 



,W 



>^'it) = { 



q{i; ^,t)s{j;r])"^k{i, j'l'ip) for spatiai proximity 
q{i',^,t)s{j;r))/3kC-j ior associative networks (^) 

q{i; ^, t)s{j; 'n)pkrl, for frequency networks. 



In tliese examples, tlie function < q{i;^,t) < 1 represents tlie infectivity of individual 
i at time t, and < s{j;r]) < 1 tlie susceptibility of individual j which is assumed 
constant. The rate of spatial (or environmental) transmission is given by ^k{hj]'4') and 
would typically be a function of Euclidean distance between i and j. Network rates of 
transmission are represented by I3kc\, and Pkr\j ■ The important distinction between the 
latter two terms is that q- represents network associations (0 or 1) between i and j with 
associated infection rate /?/;, whereas rj. represents a vector of (potentially infectious) 
network contact rates with associated probability pt that a contact results in an infection. 

We remark that, if contacts are assumed to occur according to some underlying Poisson 
process with rate r)- , infections occur according to a thinned version of this Poisson 
process, the thinning being governed by pk- We assume that the contact rate is a priori 
independent of the probability of the infection being transmitted. Thus, a contact between 
infected individual i and susceptible j will not necessarily result in j being infected, just 
that there is some positive probability of infection being transmitted. Considering a 
putative full model containing several networks, the information contained in p therefore 
gives a direct measure of risk for each contact network in r jj . 

4 Incorporating Contact tracing data 

Contact tracing data, C, represents a list of all known contacts, both incoming and 
outgoing, that have occurred between newly detected cases j and all other individuals 



during a contact tracing window defined as the interval [Tj,Nj). In other words, these 
data provide a means to observe the underlying contact process driving the infectious 
process as discussed in Section 3. 

Let Cijk{t) be a right continuous counting process describing the number of contacts 
that have occurred between infected individuals i and j along network k up to time t, 
with ACijk{t) = lim<5|o [Cijk{t) — Cijkit — 6)]. If it were possible to obtain CTD for all 
time, as well as perfectly observing individuals' infection states, then inference on p could 
be made using a geometric model with likelihood denoted by Lq{-). For example 



Ln{I,N,R,C\p) = f]exp 



j€V 



/ , / ^Zijk{t)dCijk{t) 
^^ iey(t) fe=i 



n fD{N,-I,)x Yl [l-Fn{n,.-I,)] (4) 

j:N<T„ts j-Nj>T„ts 



with 



Zijk{t) = l[Ij =t]log{q{i;^,Ij-Ii)s{j;'n)pk) + {l-l[Ij > t])log{l-q{i;^,Ij-Ii)s{j;r])pk) 

and y{t) =X{t)\jM{t). 

The limitation to this approach is, of course, that CTD is only available for the contact 
tracing window and so the naive application of such a likelihood will give a biased estimate 
of p. 

To address this, we divide the epidemic into periods of time for which CTD is observed, 
and periods for which it is not. For each individual j in the population, let Wj be the set 
of times for which contact tracing is observed (ie the contact tracing window), and WJ 
the set of times for which it is not. Then, the cr-algebras J^ = a{ACijk{t);t £ Wj) and 
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Q = a{ACijk{t)]t e W|) represent the information contained in CTD, and that lost by 
not having CTD, respectively. 

Theorem 1. Where CTD is only available for contact tracing windows Wj, if the un- 
derlying contact rates r-^ ,k = 1,...,K are known, the likelihood with respect to J-" is 
obtained by taking an expectation with respect to Q such that 



L:p{I,N,R,C\e) 



j:i,ew, 



K 



w. 



/ , / ^Zijk{t)dCijk{t) 



1 i€y{t) k=l 



\j{t)dt 



Wf 



n fD{N,-I,)x n [l-Fn{Toks-m. (5) 



See Supplementary Material A for proof. 

Theorem 1 relies on knowing the underlying contact rate for a particular network. 
For associative and spatial transmission modes (and indeed background pressure) the 
contact rate between individuals, r}- , is not known and these cases do not therefore fit 

(k) 

into our contact tracing framework. However, since the overall infection rate /3fc = Pk^lj , 

(k) 

where it is assumed that r)- is constant with respect to both i and j, we can regard 
infections via these modes as occurring due to an unobserved contact process (Equation 
3). The likelihood function in Equation 5 is therefore sufficiently flexible to include these 
transmission modes in the VV| related terms. 

Finally, we highlight that the likelihood function is written in terms of the data /, N, 
and R, given the transmission parameters 0. However, the epidemic process means that 
infection times are never directly observed. We therefore treat / as missing data, using 
Bayesian data augmentation techniques described in the next section. 
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5 Bayesian Inference 

A Bayesian approach permits the incorporation of prior information early in the epi- 
demic, and provides a natural way to deal with /, the vector of censored infection times. 

Independent priors are assigned to the model parameters 6 = {e,p, /3,7, ■0, ^, £}• 
Typically, Gamma distributions are used for rate parameters (ie e, /3, 7, 0) with Beta 
distributions used for probability parameters (ie p), as demonstrated in Section 6. 

The main features of the adaptive reversible jump MCMC algorithm used to fit the 
model are outlined here, with full implementational details available in the Supplementary 
Material B. An adaptive multisite update (Haario et al., 2001) is used for the disease 
transmission parameters 0, and avoids the necessity for time consuming pilot tuning runs. 
This is useful to speed up the algorithm usage in a real time setting. The implementation 
of reversible jump (Green, 1995) is two fold. Firstly, the dimension of / is allowed to 
expand or contract according to either and addition or deletion update step for occult 
infections from the parameter space (Jewell et al., 2009b). This allows the algorithm to 
explore the possibility occult infections present at the time of analysis consistent with 
the model. Secondly, infection events (including occults) are attributed to either a traced 
contact related to the Wj components of transmission, or spatial/untraced infections 
related to the W| components. The switching in and out of the sets Wj comprises a 
reversible jump move, since it results in the infection appearing in different parts of 
the likelihood. The update step for an infection time therefore proposes the reversible 
jump with probability 0.5, and a contact to contact, or non-contact to non-contact move 
otherwise. 

Shared memory parallel computing (GNU C-|— h using OpenMP) is used to calculate 
the likelihood, ensuring that the algorithm runs in an overnight timeframe (Jewell et al., 
2009b). 
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6 Case study 

To illustrate how this contact tracing methodology might be used in practice, we return 
to the dataset presented in Section 2. First we describe the special case of HPAI within the 
British poultry industry. Since no epidemic outbreak of HPAI has yet occurred in Britain, 
we provide two simulation studies to demonstrate our approach for the given model. We 
then demonstrate how posterior information changes in response to the amount of data 
available from an epidemic, both in terms of the length of the timeseries and the presence 
or absence of CTD. Finally we present a simulation study of 4 outbreak scenarios in which 
different surveillance strategies are employed for early detection of new infections. 

The model For this example, four modes of contact (Feedmill, Slaughterhouse, and 
Company Networks, and Spatial (environmental) transmission) contribute to the infection 
rate between infected or notified individual (ie farm) i and susceptible j. For the Feedmill 
and Slaughterhouse networks, contact frequency information is available, whereas for the 
Company network only the presence or absence of a business association is known. The 
spatial transmission rate is then parameterised as a function of the Euclidean distance 
between the two individuals, centred at 5km to improve MCMC mixing and aid parameter 
interpretability. The transmission modes are therefore 

Xf{t) = q{^;tt)s{J;rJ)P,c^''h[^eI{t)] 
^f{t) = 9(i;C,t)s(j;^)/32e-^(''--^) 

where r[,^^ and rf-^ are contact frequencies for Feedmill and Slaughterhouse contacts re- 
spectively with associated probabilities of infection pi and p2, cf,^ is or 1 depending on 
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whether a business association exists between i and j with associated rate of infection 
/?!, pij represents the Euclidean distance in kilometres between i and j with an exponen- 
tial distance kernel with decay tp and parameter /32 interpreted as the rate of infection 
between individuals 5km apart. The indicator function l[i e I{t)] removes the network 
transmission modes during the notified period, reflecting statutory movement restrictions 
on the infected farm (European Economic Community, 1992). 

The infectivity function q{i]^,t) is defined 



'^ t>0,Ii<t<Ni 



q{i;^,t) = < 



M+e 



1 t > 0, iV, < t < i?, 



otherwise 



with // = 1.3 and /i = 60 is assumed, determined by fitting q{i; ^, t) to expert opinion, rj 
is a 10-dimensional vector of susceptibilities for each production type, such that s{j;r]) 
returns the susceptibility of the major production species on farm j. In our example, we 
assume that broiler chickens are the most susceptible, and set this to 1. All other elements 
of 77 are therefore susceptibilities relative to broilers. As in our previous work, foi-) is 
given by foit) = ab ■ exp [bt — a (e^* — l)] with a = 0.015 and b = 0.8 assumed, giving a 
mean infectious period of 6 days as determined from expert opinion. Prior distributions 
for the remaining parameters were chosen as suggested in Section 5, and are shown in 
Table 1. 

Posterior information To illustrate how the acquisition of CTD can improve posterior 
precision, we use a simulated epidemic on the GBPR dataset as described in Section 
2, using the model described in Section 3. The epidemic is realised using a stochastic 
simulation based on the Doob-Gillespie algorithm (Gillespie, 1976) applied at the level of 
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individual contacts (to generate CTD), with the extension that retrospective samphng is 
used to determine whether a contact results in an infection (Jewell et al., 2009a). 

Here we investigate how the assimilation of improves inference by increasing posterior 
precision. We base our test analysis on a typical simulated epidemic lasting 109 days 
in which 350 farms become infected, using the parameter values presented in Table 1 
(see also Supplementary Material C, Figure 1). Two timepoints were analysed, day 40 
(incomplete epidemic) and day 109 (complete epidemic), as shown in Table 2. CTD for 
a 21 day period preceeding a notification event was recorded for each notified individual. 
The algorithm was then run for the two timepoints during the epidemic, with and without 
the associated CTD. 

The density plots for parameters e, pi, p2, /3i, /32, 7, and xp are shown in Figure 
1. Of particular interest are the plots for the probability parameters pi and p2, which 
are, of course, directly informed by the contact tracing information. It is immediately 
apparent that the addition of contact tracing data affects the marginal posteriors of these 
parameters, with an increase in precision in each case. The other parameters are far less 
affected, with only minor differences apparent in the complete epidemic analysis. Density 
plots for the components of ry and histograms of the posterior number of occults present 
on day 40 are shown in the Supplementary Material C, Figures 2-4. 

Practical application To test a prospective application of our methodology, we focus 
on the statistical detection of occult infections. We postulate that this provides a method 
to target limited surveillance resource to the most likely infected individuals. A simula- 
tion study was performed in which 4 surveillance strategies were tested. In the "Reactive" 
strategy, no active surveillance is performed, and cases are notionally detected and re- 
ported by the farmer. The 3 remaining strategies use active (pre-emptive) surveillance 
to look for disease as well as case detection by the farmer, with the strategy being im- 
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plemented at day 14, mimicking the application of such a strategy once it has become 
apparent that an epidemic is taking off. "Random" surveillance represents a strategy in 
which a random sample of size z holdings within the statutory 10km surveillance zone 
are visited and tested daily (Defra, 2009). The "Bayes" targeted strategy uses our algo- 
rithm (without contact tracing data) on a daily basis to rank holdings in terms of occult 
probability; the top z holdings are then visited and tested for disease. The "Bayes-CT" 
strategy then adds in CTD to target surveillance similarly. For the purposes of our sim- 
ulation study, we assume a surveillance resource of ^ = 15. We also assume that each 
surveillance team has at their disposal a perfect test for the disease - ie 100% sensitivity 
and specificity - and that if positive, the farm is culled immediately. For each strategy, 
500 epidemic realisations with random index cases were generated to integrate over the 
stochasticity in the epidemic process and study the efficacy of each surveillance strategy. 
These were further conditioned to involve at least 2 individuals and last greater than 14 
days, such that the epidemic could be said to have "taken off". A flow diagram of the 
simulation study is shown in the Supplementary Material C, Figure 5. 

As metrics for comparing the performance of the 4 surveillance strategies, the number 
of culled holdings (both as a result of Reactive and surveillance detection) and epidemic 
duration (ie time from the first infection to the last cull with no further infected or notified 
holdings) were used. These results are summarised in Table 3 and Figure 2, and indicate 
that active surveillance as implemented here is effective in reducing the probability of 
large scale outbreaks. The Random strategy, whilst successful in reducing the mean size 
of the outbreak, does not appreciably affect the probability of a large epidemic. Using 
Bayes targetting effects both of dramatically reducing the mean epidemic size and the 
probability of a large epidemic. The addition of CTD improves the efficacy of surveillance 
further: the Bayes strategy reduces the mean epidemic size by 4-fold as compared to 
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the Reactive strategy, the Bayes-CT strategy reduces it by 10-fold. A similar picture is 
reflected in the results for epidemic duration, with the presence of a surveillance strategy 
greatly reducing the mean. As before, the effect of Bayes targeting is profound, with the 
presence of CTD reducing the mean variance of the duration distribution. These results 
suggest that Bayes targeting of surveillance using case incidence data and contact tracing 
may have much to offer in disease control resource prioritisation. 

7 Discussion 

The results presented in the previous section show contact tracing data is a useful 
addition for inference and prediction on SIR type epidemic models. The purpose of 
this methodological innovation is demonstrated well in Figure 1 showing an increase in 
posterior precision in response to the acquisition of the contact tracing information in 
parameters pi and p2- Of particular interest is the behaviour of the posterior in response 
to differing amounts of data, and for this reason the density plots should be considered 
in conjunction with Table 2. For day 40, only 1 out of 4 slaughterhouse contacts resulted 
in infection, compared to 3 out of 28 for feedmill contacts. Importantly, the low con- 
tact frequency of the slaughterhouse network (see Supplementary Material C, Figure 6; 
Supplementary Material D) means that the marginal posterior for p2 is poorly informed 
without the addition of contact tracing data. Conversely, the full epidemic dataset on day 
109 contains more contact observations, reflected in the narrower posterior distributions. 
Here, there is perhaps little difference in the marginal posteriors for pi with and without 
CTD, though the effect on p2 remains marked. 

Despite its simplistic setup, the results of the surveillance strategy simulation study 
are encouraging, lending evidence to Bayes guided surveillance being highly effective for 
reducing infection to detection time, and hence abrogating the spread of an epidemic. 
Importantly, this methodology provides a way of immediately linking targeted surveillance 

16 



to changes in outbreak dynamics due to unexpected behaviour of a new strain of the 
disease, changes in control strategy, or changes in underlying population behaviour - 
changes that may well not be apparent to the naked eye in an evolving epidemic timeseries. 

Given the marked influence that CTD has on the posterior parameter estimates, care 
must be taken to avoid biasing the results through the use of biased datasets. The 
structure of the likelihood assumes independence between the data contained in individual 
contact tracing questionnaires (ie between individuals), and therefore the model is robust 
to absence of contact tracing from random cases. This is important as the analysis 
is unbiased in the case that the speed of the disease process exceeds the capacity to 
collect contact tracing data. Source of bias, however, may well arise through a case's 
reluctance to declare certain contacts, and therefore carefully designing tactful contact 
tracing questionnaires should be of high priority. Nevertheless, in our Bayesian setup, the 
model extends naturally to include expert opinion on the level of such reporting bias, so 
an attempt to correct for it can be made. 

In a UK livestock setting, although CTD is collected and used informally by field 
operatives to identify at risk farms, there currently appears to be no formal dissemination 
of this data to analysts. However, given the strong evidence for its use presented here, we 
strongly recommend that it should be made routinely available together with case data. 
Given reliable sources of such data, therefore, we envisage that real time inference and risk 
prediction for epidemics will become commonplace in reactive disease control strategy. 

Supplementary Materials: Supplementary Material contains the proof of Theorem 
1, technicalities of the rjMCMC algorithm used in Section 5, supplementary figures, and 
an intuition for the information gained using CTD. 

Acknowledgements: We thank Professors Laura Green and Matt Keeling for helpful 
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Tables and Figures 



Parameter 


True Value 


Prior 


e 


l-^day-^ 


Gamma(0.15,5000) 


Pi 
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Beta(l,l) 
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Table 1: Transmission parameter "true" values used to simulate the test epidemic for 
contact tracing. 



Time / 
days 


Notified 
infec- 
tions 


Occult 
infec- 
tions 


Contacts resulting 
in infection 


Contacts not 

resulting in 

infection 




159 
350 


19 



Feedmill S 'house 


FeedmiU S 'house 


40 
109 


3 1 

7 5 


25 3 

78 12 



Table 2: The state of the epidemic at each observation time. True number of epidemi- 
ologically relevant (ie originating at infectious individuals) contacts are known from the 
simulation algorithm. 



Strategy 


Mean 


# culled (95% CI) 


Mean duration (95% CI) 


SOS 




203.5 (2,727) 


73.2 (14.5, 147.2) 


Random 




120.1 (2,709) 


60.0 (16.8, 122.4) 


Bayes 




48.1 (2,521) 


23.8 (14.3, 50.5) 


Bayes-CT 




19.3 (2,204) 


27.3 (14.2, 64.8) 



Table 3: Mean number of holdings culled and mean epidemic duration (time from first 
infection to last removal) for each surveillance strategy, conditional on the epidemic lasting 
longer than 14 days. CI = credible interval. 
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Figure 1: Kernel density estimates of the marginal posterior distributions of e, pi, ^2, /^i, 
(32, 'y, ip for Day 40 and Day 109. 
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(a) The distribution of the logarithm of total number of 
culled premises (reactive culling plus active surveillance culls), 
log(mean number culled) shown by dashed lines 
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Days 
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(b) The distribution of epidemic duration, mean duration shown 
by dashed lines. 

Figure 2: Histograms of total number of farms culled and epidemic duration under the 4 
control strategies in the simulation study. 
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^ A Proof of Theorem 1 

v^ Consider a time interval [0, n) which is divided into intervals At = [t — 5, t). Let Wj = {At; < 

C*^ t < n} the set of time intervals for which contact tracing is observed, and W^ its complement. 

cn Further we define the a-algebras T = a{ACijk{t);t E Wj) and Q = a {ACijk{t);t e Wj) 

^2 representing the information contained in contact tracing data, and the information lost by not 



having contact tracing data, respectively. 

If we were to observe contact tracing for all time (and not just a contact tracing window), a 
geometric likeHliood may be used to make inference on the vector of infectivity parameters £, 
susceptibility parameters rj, and pk the probability that a contact along network fc = {1, . . . , K} 
of K networks results in an infection. Letting Pijkif) = q{i] £, t)s{j] r])pk, < Pijk{t) < 1, 



K 



L^{I,N,R\e) 



n n n nt(^^^-^w)'"^'^''(i-^^^'^w) 

jev Ate[o,u) iey{t) k=i 

II fn{N,-I^)x II [l-Fn{T,bs-I: 



l-llI-jKtU^^ijkit) 



j--N<T,bs 



j:N,>T,hs 



K 



TT TT TT TTe^'^'"^*)^^'^'"^*) 

3€V At€[(.),u)i€y{t)k=l 



j:N<Tobs 



j:N,>Tobs 



where y{t) = X{t) U M{t) the set of both infected and notified individuals at time t and 
Z.jkit) = (l[/j e At] log(p,,fc(t)) + (1 - l[/j > t + 5]) log(l - p,,fc(t))) . 

Where contact tracing is not observed, for time intervals in W|, the presence or absence of a 
contact is replaced with the expectation with respect to Q 
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jev AteWj iey{t) fc=i jev Atew^ iey{t) fe=i 

' f 
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(A.l) 



We now address the question of how to calculate 



Ec 



K 



exp ^ ^ ^Zijfe(t)ACijfc(t) 



Atewr iey(t) fe=i 



(k) 

for all j. Conditional on the history T-lt of the epidemic at time t, if we know the frequency r)- 
of contacts between individuals i and j along network k, we can write the joint probability of 
j being infected via k in the notional time interval At = [t — 5, t) 



P(/j e At,AC(t) = l|7^t) = P{IjeAt\AC{t) = l,nt)P{AC{t) = l\nt) 

i€l(t) 






In terms of the probability density, this is equivalent to taking an expectation over the contact 
tracing data C such that for all j 
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AteWU€l{t) k=l 



AteWr 
+o{5) 



with \j{t) = ^^^1 T.^mt) P^lk{t)r\.' + 7 Y.^eN(t) Pvk{t) 
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(fe) 
i€N{t)l'iik\'^)' ij 



the total infectious pressure on 



j at time t. 

Notice that the first term in the square parentheses picks out intervals in which infections 
have occurred (implying also that a contact occurred), with the remaining intervals up to j's 
infection time Ij appearing in the second term. Observing that the number of intervals in which 
contacts occurs becomes a negligible proportion of the total number of intervals, the second 
terms becomes 
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This is equivalent to the form of the exponential likelihood in Equation 1, main paper. 

Taking the limit (5 — > throughout and a product over all individuals in the population - since 
in the Markovian structure of the epidemic model infection events are conditionally independent 
- the embedded contact tracing likelihood is 
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n ^^p 



i:/,6W, 
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3 iey{t) fe=i 
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Xj{t)dt 
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(A.2) 



Remarks For networks, or more generally modes of disease transmission, where contact fre- 
quency is not known pk and r\- cannot be identified. Instead, we make inference on the overall 
infection rate /3fe = Pk^lj , where it is assumed that r)- is constant with respect to both i and 
j. These modes of transmission necessarily appear in the W| terms in the likelihood, and are 
therefore unable to admit contact tracing data in our current setup. 



B MCMC Algorithm 

The following reversible jump MCMC algorithm is used to estimate the joint posterior distribu- 
tion of the Bayesian epidemic model described in Sections 4 and 5 of the main paper. It is based 
on the methodology developed in ?, but contains significant methodological enhancements to 
encompass the binomial addition to the likelihood. Specifically, since infections can now be 
caused by either an observed or unobserved contact, MCMC updates for the infection times 
need to be modified, taking into account that for any particular iteration, an infection time can 
exist in one of two states: Ij € Wj if j is infected by an observed contact, or Ij € W| if j is 
infected in a period of time for which contact tracing data is not observed. 

Defining the number of occult infections at iteration n as m^"'^ = X^jgj(n) ll(A'i > Tots), the 
algorithm proceeds as follows: 

1. Take starting values 6>(o) = {e^^\p^^\ f3'-°\j^^\'ip^^\T]^^^} and /W 

2. Update 0("+i)|/("), AT^, R("),C(") 

3. For z times choose one of: 

(a) Update an infection time: /i"+^^|li"\ AT^, i2(") with s ~ Uniform ([/(")]) 

(b) Add an infection: /("+i) = {/(") + s} with s ~ Uniform ( [5 ("']) 

(c) Delete apreviously added infection: /("+^) = {/(")— s} with s ~ Uniform([X(")(Tofos)]) 

4. Goto 2 

In step (2), the entire transmission parameter vector is updated using an adaptive multisite 
MetropoHs-Hastings step (?). The steps to move, add, or delete an infection are now described 
in detail. 
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B.l Moving an infection time 

The derivation of the contact tracing hkeUhood iniphes that if Ig is equal to the time of a known 
contact arriving at s, then the contact must have caused the infection; conversely, if Ig falls 
at any other time, then it must have been infected by a rate-driven component of the model. 
Since Is appears in different parts of the Ukelihood depending on Wj, four possibilities exists 
for moving an infection time: 

(a) Is £ Ws — > /* £ Ws'- A contact infection time is moved to another contact 
infection time. 

(b) Is £ Wg — )> /* £ Ws' Non-contact infection time is moved to another 
non-contact infection time. 

(c) Is £ Ws —)>/*£ Ws: Non-contact infection time is moved to a contact 
infection time. 

(d) Is £ Ws — > /* £ Ws- Contact infection time is moved to a non-contact 
infection time. 

Moves (c) and (d) result in Ig appearing in different parts of the likelihood, resulting in a change 
of model dimension. These moves will therefore be termed ^Hrans- dimensional" . Conversely, 
moves (a) and (b) do not change the dimension of the model and are therefore termed ^^cis- 
dimensional" . Given the current status of J^, a trans- dimensional move is proposed with 
probability 0.5, else a cis- dimensional move occurs. For proposing a contact infection time, the 
following distribution is used 

/r~ Uniform fcl")' 



where Q" is the set of s's incoming contacts (on all networks) that originate from infective 
individuals i at the current state n of the Markov chain (ie known infectives and occults). In 

6 



other words, we propose a new infection time by sampling uniformly from the contact times. 
Note that the only valid move for s : W^ = 0, which includes occults, is (b). 

In conjunction with the four possible proposal schemes, therefore, there are four possible ac- 
ceptance probabiUties for updating an infection time chosen uniformly at random: 



(a) 
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/, e Wf ^ /* e W^: Propose /; = gn{Ns - 4*) if known and I^ = goiTobs - I*) if 
occult. Accept with probability 
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(c) 



Is G W^ —)>/*£ W^: Propose /* ~ Uniform ( Cs ) and accept with probability 
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Probability densities 9_d(-) and goi') sjce used for proposing non-contact infection times for 
known and occult infections respectively. Similarly to our previous work (?), we propose from 
9D{t) = foit) for known infections, and approximate the censored density 1 — Fr){t) using a 
second order Taylor expansion 

gD{t)= ^°™^^(-^'^ 

B.2 Adding or deleting an infection time 

The addition and deletion updates are performed as in ?. Briefly, for an addition, an individual 
s is chosen uniformly from S^"'\ the set of susceptible individuals at the nth iteration of the 
Markov chain. An infection time is then proposed from the density 1 — Fo{Tobs — Ig) as for 
moving an occult infection time. The addition is accepted with probability 



7i{I(--)\N,R,f3i")) ([5H] + 1) • gni^hs - /*) 

For a deletion, an occult individual s is chosen uniformly. It's deleting is then accepted with 
probability 

^ (Tin) w„+m ^ , . VT ({/(")- 4 |Ar,fi,/3(")) [5(»)]-fe(T.,.-/.) 
^ ' ^ n{I(n)\N,R,f3(n)) [5(")]-l 

The sets Cs are consequentially updated for all j since contacts that were previously thought 
to have originated at susceptibles, might now originate at infecteds, and vice versa. 
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Figure 1: Simulated epidemic trajectory for Section 6 of the main paper showing Infected, 
Notified, and Removed individuals. 
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Figure 2: Histogram of number of occult infections for the test epidemic on day 40: with contact 
tracing (left); without contact tracing (right). True number of occult infections indicated by 
dashed line. 
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Figure 3: Production type susceptibilities (??2...9) relative to broiler chickens (r^i) for the Day 40 
example. 
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Figure 4: Production type susceptibilities (/?2...9) relative to broiler chickens (?7i) for the full 
epidemic (Day 109) example. 
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Figure 5: Dataflow diagram of surveiUance strategy simulation study. SI) basic "SOS' dis- 
ease control; S2) random surveillance within SZ; S3) Bayesian targeted surveillance, not using 
contact tracing; S4) Bayesian targeted surveillance using contact tracing data. 
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Figure 6: Histograms of the log distribution of incoming contact frequency (contacts per week) 
for the feedmiU (top) and slaughterhouse (bottom) networks, according to ?. 
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D Information gained from contact tracing data versus 
case incidence data 

To provide an intuition for the difference in added posterior information between pi and ^2, 
it is useful to consider how the increase in precision due to CTD varies as a function of the 
underlying contact rate. In a simple model in which an individual gets infected via a frequency 
network with contact rate r and infection probability p during an interval of length 5, the 
variance of the estimator p is 

Var,,p(p) = l/{r5f 

for an exponential model and 

Var,,„(p) = l/{r5f - l/{r5f 

for the geometric model. Ya.Xgeo{p) is therefore guaranteed less than YWf^xpip)- 

This gives insight into the fact that the acquisition of any contact tracing data always adds 
statistical information, and since for large r the first term in Vargeo(p) dominates, the variance 
converges for the two models as shown below in Figure 7. In other words, although the variance 
of the geometric model is always less than for the exponential model, the inclusion of CTD may 
only be worthwhile if contact rates are low. 
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Figure 7: Var(p) for both geometric and exponential models with 5 = 1. 
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